It is claimed: 

1 . A method for retrieving answers to questions from an information retrieval 
system comprising: 

generating a set of phrases that identify different categories of questions; 
generating candidate transforms for each phrase; 
weighting the candidate transforms; 
ranking the candidate transforms; 

applying the transforms to an information retrieval system. 

2. A method as in claim 1 further comprising: 

filtering candidate transforms prior to weighting. 

3 . A method as in claim 2 further wherein: 

natural language processing techniques are used for filtering. 

4. A method as in claim 3 wherein: 

a natural processing technique used is part-of-speech tagging. 

5 . A method as in claim 4 wherein: 

Brill's part-of-speech tagger is used. 

6. A method as in claim 3 wherein: 

the natural language processing techniques used are feature selection 

techniques. 

7. A method as is claim 1 wherein: 

the questions are categorized by similar goals. 
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8. A method as in claim 7 wherein: 

the categories are identified using an fl-gram approach. 

9. A method as in claim 8 wherein: 

phrases are generated by computing the frequency of all n-grams of length 
minQtokens to maxQtokens words, with all n-grams anchored at the beginning of the 
questions. 

10. A method as in claim 9 wherein: 

all w-grams that occur at least minQphraseCount times are used for 
generating question phrases. 

11. A method as in claim 7 wherein: 

the input to generating the set of phrases is a set of questions. 

12. A method as in claim 7 wherein: 

the output to generating the set of phrases is a set of question phrases that 
can be used to classify questions into respective question types. 

13. A method as in claim 1 further comprising: 

filtering the generated phrases* 

14. A method as in claim 13 wherein: 

natural language processing techniques are used for filtering. 

15. A method as in claim 14 wherein: 

a natural processing technique used is part-of-speech tagging. 
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1 6. A method as in claim 1 5 wherein: 

Brill's part-of-speech tagger is used. 

1 7. A method as in claim 1 4 wherein: 

the natural language processing techniques used are feature selection 

5 techniques. 

18. A method as in claim 1 wherein: 

generating candidate transforms comprises generating initial candidate 
transform phrases. 

19. A method as in claim 1 8 wherein: 

10 initial candidate transforms are generated by using a collection of 

Question/Answer pairs. 

20. A method as in claim 19 further comprising: 

filtering initial candidate transform phrases. 

21. A method as in claim 20 wherein: 

1 5 initial candidate transforms are filtered by minimum co-occurrence. 

22. A method as in claim 20 further comprising: 

weighting filtered initial candidate transforms. 

23. A method as in claim 22 further comprising: 

filtering all weighted initial candidate transforms. 
20 24. A method as in claim 19 wherein: 

the collection of pairs has been tagged with a part-of-speech tagger. 



21 



25. A method as in claim 24 wherein: 

Brill's part-of-speech tagger is used. 

26. A method as in claim 1 9 wherein: 

initial candidate transform phrases are filtered by eliminating generated 
answer phrases that contain a noun. 

27. A method as in claim 1 9 wherein: 

all potential answer phrases are generated from all of the words in the 
prefix of Answer for each Question/Answer pair where a prefix of Question matches each 
question phrase. 

28. A method as in claim 27 wherein: 

w-grams of length minAtokens to maxAtokens words are used, starting at 
every word boundary in the first maxLen bytes of the Answer text. 

29. A method as in claim 28 wherein: 

from the resulting w-grams, the topKphrases with the highest frequency 
counts are kept. 

30. A method as in claim 1 9 wherein: 

information retrieval techniques for term weighting is applied to rank the 
initial candidate transforms. 

31. A method as in claim 30 wherein: 

a Sparck Jones inverse collection frequency weighting scheme that uses 
relevance information is applied. 



22 



32. A method as in claim 30 wherein: 

candidate transforms are weighted by assigning to each phrase an 
Robertson/Sparck Jones term weight with respect to a specific question type, 

33 . A method as in claim 3 0 wherein: 

the weight is computed for each candidate transform tr t by computing the 
count of Question/Answer pairs where fr, appears in the Answer to a question matching a 
question phrase as the number of relevant documents; 

considering the number of remaining Question/Answer pairs where trt 
appears in the Answer as non-relevant, and; 

applying the formula w (1) = (r + 0.5)/(R-r + 0.5) 

in - t-0.5) /(N-n-R + r + 0.5) ' 

34. A method as in claim 3 0 wherein: 

term selection weights are computed for each candidate transform. 

35. A method as in claim 34 wherein: 

term selection weights, wtr» for each candidate transform tr h are computed 

as : 

wtr, =qtf t 

where qtfi is the co-occurrence count of tr t with QP, and vi>/ 7; is the relevance-based term 
weight of tr t computed with respect to QP. 
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36. A method as in claim 1 9 further comprising: 

sorting the initial candidate transforms into buckets according to the 
number of words in the transform phrase, and up to maxBucket transforms, with the 
highest values of term selection weights kept from each bucket. 

37. A method as in claim 36 further comprising: 

filtering and weighting the initial candidate transform prior to sorting. 

38. A method as in claim 1 wherein: 

ranking the candidate transforms comprises retrieving a set of 
Question/Answer pairs and for each pair and the candidate transforms, applying a 
transform to each Question. 

39. A method as in claim 38 further comprising: 

sorting Question/ Answer pairs by increasing answer length prior to 
ranking the candidate transforms. 

40. A method as in claim 38 wherein: 

the transforms are encoded so that they are treated as phrases by the 
information retrieval system. 

41. A method as in claim 38 wherein: 

a Question requires parts of the query in matching pages. 

42. A method as in claim 38 wherein: 

a Question does not require parts of the query in matching pages. 
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43 . A method as in claim 3 8 wherein: 

multiple transformations are combined into a single query. 

44. A method retrieving documents from an information retrieval system comprising: 

categorizing questions asked of the information retrieval system into 
5 different types; 

generating phrases that identify the question types; 
generating candidate query transformations for each phrase from a training 
set of question/answer pairs; 

evaluating the candidate transforms on the target information retrieval 

10 systems, and; 

applying transformations to queries submitted to the information retrieval 

system. 

45. A method for retrieving documents as in claim 44 wherein: 

the questions are categorized by similar goals. 
15 46. A method for retrieving documents as in claim 44 wherein: 

phrases are generated by computing the frequency of all w-grams of 
length minQtokens to maxQtokens words, with all w-grams anchored at the beginning of 
the questions. 

47. A method for retrieving documents as in claim 46 wherein: 
20 all n-grams that occur at least minQphraseCount times are used for 

generating candidate transforms. 
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48. A method for retrieving documents as in claim 44 further comprising: 

filtering the generated phrases. 

49. A method for retrieving documents as in claim 48 wherein: 

natural language processing techniques are used for filtering. 

50. A method for retrieving documents as in claim 49 wherein: 

a natural processing technique used is part-of-speech tagging. 

51. A method for retrieving documents as in claim 50 wherein: 

Brill's part-of-speech tagger is used. 

52. A method for retrieving documents as in claim 49 wherein: 

the natural language processing techniques used are feature selection 

techniques. 

53. A method for retrieving documents as in claim 44 further comprising: 

filtering, weighting, and ranking the candidate query transformations prior 
to evaluating on the information retrieval systems. 

54. A method for retrieving documents as in claim 53 wherein: 

natural language processing techniques are used for filtering. 

55. A method for retrieving documents as in claim 54 further comprising: 

initial candidate transforms are filtered by minimum co-occurrence. 

56. A method for retrieving documents as in claim 53 wherein: 

generating candidate transforms comprises generating initial candidate 
transform phrases. 
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57. A method for retrieving documents as in claim 44 wherein; 

filtering initial candidate transform phrases. 

58. A method for retrieving documents as in claim 44 wherein: 

the training set of pairs are tagged with a part of speech tagger. 
5 59* A method for retrieving documents as in claim 44 wherein: 

candidate transforms are filtered by eliminating phrases with nouns. 
60. A method for retrieving documents from an information retrieval system 
comprising: 

entering a question whose answer is desired; 
1 0 classifying the question by matching it with predetermined question 

phrases; 

retrieving the associated question phrases; 

rewriting the question by applying each associated question phrase to the 
question to create transformed queries; 
1 5 submitting the transformed queries to an information retrieval system; 

analyzing the returned documents; 
scoring the returned documents; 

ranking the returned documents by their respective scores, and; 
documents ranked above a predetermined level as the resulting retrieved 

20 documents. 
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